Help needed with pyparsing [closed]

Posted by Zearin on Programmers See other posts from Programmers or by Zearin
Published on 2012-04-10T15:20:19Z Indexed on 2012/04/10 17:45 UTC
Read the original article Hit count: 451

Filed under:

Overview

So, I’m in the middle of refactoring a project, and I’m separating out a bunch of parsing code. The code I’m concerned with is pyparsing.

I have a very poor understanding of pyparsing, even after spending a lot of time reading through the official documentation. I’m having trouble because (1) pyparsing takes a (deliberately) unorthodox approach to parsing, and (2) I’m working on code I didn’t write, with poor comments, and a non-elementary set of existing grammars.

(I can’t get in touch with the original author, either.)

Failing Test

I’m using PyVows to test my code. One of my tests is as follows (I think this is clear even if you’re unfamiliar with PyVows; let me know if it isn’t):

def test_multiline_command_ends(self, topic):
                output = parsed_input('multiline command ends\n\n',topic)
                expect(output).to_equal(
r'''['multiline', 'command ends', '\n', '\n']
- args: command ends
- multiline_command: multiline
- statement: ['multiline', 'command ends', '\n', '\n']
  - args: command ends
  - multiline_command: multiline
  - terminator: ['\n', '\n']
- terminator: ['\n', '\n']''')

But when I run the test, I get the following in the terminal:

Failed Test Results

Expected topic("['multiline', 'command ends']\n- args: command ends\n- command: multiline\n- statement: ['multiline', 'command ends']\n  - args: command ends\n  - command: multiline") 
      to equal "['multiline', 'command ends', '\\n', '\\n']\n- args: command ends\n- multiline_command: multiline\n- statement: ['multiline', 'command ends', '\\n', '\\n']\n  - args: command ends\n  - multiline_command: multiline\n  - terminator: ['\\n', '\\n']\n- terminator: ['\\n', '\\n']"

Note:

Since the output is to a Terminal, the expected output (the second one) has extra backslashes. This is normal. The test ran without issue before this piece of refactoring began.

Expected Behavior

The first line of output should match the second, but it doesn’t. Specifically, it’s not including the two newline characters in that first list object.

So I’m getting this:

"['multiline', 'command ends']\n- args: command ends\n- command: multiline\n- statement: ['multiline', 'command ends']\n  - args: command ends\n  - command: multiline"

When I should be getting this:

"['multiline', 'command ends', '\\n', '\\n']\n- args: command ends\n- multiline_command: multiline\n- statement: ['multiline', 'command ends', '\\n', '\\n']\n  - args: command ends\n  - multiline_command: multiline\n  - terminator: ['\\n', '\\n']\n- terminator: ['\\n', '\\n']"

Earlier in the code, there is also this statement:

pyparsing.ParserElement.setDefaultWhitespaceChars(' \t')

…Which I think should prevent exactly this kind of error. But I’m not sure.

Even if the problem can’t be identified with certainty, simply narrowing down where the problem is would be a HUGE help.

Please let me know how I might take a step or two towards fixing this.

Developer IT